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Abstract 

It is well known that a sparsely coded network in which the activity level is ex- 
tremely low has intriguing equilibrium properties. In the present work, we study the 
dynamical properties of a neural network designed to store sparsely coded sequential 
patterns rather than static ones. Applying the theory of statistical neurodynam- 
ics, we derive the dynamical equations governing the retrieval process which are 
described by some macroscopic order parameters such as the overlap. It is found 
that our theory provides good predictions for the storage capacity and the basin of 
attraction obtained through numerical simulations. The results indicate that the 
nature of the basin of attraction depends on the methods of activity control em- 
ployed. Furthermore, it is found that robustness against random synaptic dilution 
slightly deteriorates with the degree of sparseness. 

For the purpose of constructing more realistic mathematical neural network models 
(e.g, the Hopfield model 0), so-called "random" patterns, which have been used for 
simple theoretical treatments, have been reconsidered. In a network capable of processing 
these random patterns, it is frequently supposed that statistically half of the neurons are 
allowed to be active. However, such a situation is not realistic for two reasons. First, 
according to the results of physiological studies, the activity level of real neural systems 
is thought to be low. Second, in a meaningful pattern, information is generally encoded 
by a small fraction of bits in a background which occupies most of the total area. With 
these points in mind, neural networks loading sparsely coded patterns have been studied 
by many authors [0, |3|, f|, [| ||, |7| . These authors have reported that the maximal number 
of patterns stored in the network increases as the fraction of active neurons a decreases. 
Furthermore, the storage capacity in such a situation diverges as — 1/alna which is the 
optimal asymptotic form ||. However, considering the fact that the information content in 
a single pattern is reduced with the degree of sparseness, we cannot immediately conclude 
that sparse coding enhances the associative ability. Rather, what we should note is that 
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the optimal bound is obtained even for models with a relatively simple Hebbian learning 
rule. 

While, owing to these studies, progress in the understanding of the equilibrium prop- 
erties of sparsely coded networks has been made, many unsolved problems remain in 
regard to dynamical aspects. In order to grasp a network's characteristics properly, it is 
necessary to consider the dynamical properties such as the basin of attraction. In recent 
years, several theories treating retrieval process have been proposed f|, |I| [TT|, |l^, [13|, |14|. 



Among these, we note that the method of statistical neurodynamics is practically useful, 



because it enables us to describe long-term behavior [13|, [yj . However, for sparse coding, 
there is quantitative discrepancy between the results obtained from this theory and nu- 
merical simulation in the case of autoassociation, which implies a difficulty in treating the 



strong feedback mechanism with this model On the other hand, sparse coding for 
sequential associative memory has not yet been studied in detail [17, 18 1. In the present 
paper, we study this point by applying the method of statistical neurodynamics to a 
model for sequential associative memory. 

Let us consider the situation in which a neural network which consists of N McCulloch- 
Pitts neurons is designed to store sequential patterns rather than static ones. Each neuron 
obeys discrete synchronous dynamics described by 

Si(t + 1) = F[hi(t)] (1) 

N 

hM = £<Wt)-0, (2) 
i=i 

where Si(t) and hi(t) are the state and the internal potential of the ith neuron at time 
t, respectively. Although we have written the transfer function in the general form F(u), 
we consider the case F{u) = 6(u); i.e. F(u) is a step function. In this case, the state 
Si(t) takes only two values, 1 (firing state) and (resting state). The quantities 9 and J^- 
represent the uniform threshold and the strength of the synaptic connection between the 
ith and jth neuron, respectively. 

We assume that the stored patterns are generated with the probability P(£f ) = a5(£f — 
1) + (1 — a)5(£f). Here £f is the state of the ith neuron in the /ith pattern. Then, the 
activity for this network, J2i > assumes an average value of a. In particular, the case 
a — > is referred to as "sparse coding" . In order to make the network possess associative 
memory dealing with these patterns, the JijS must be designed appropriately. In the 
present paper, to construct a network capable of recalling a sequence of aN patterns 
defined by such as — > £ 2 — * • ■ ■ — > (, aN —> ^ we adopt covariance learning 

i aN 

which is usually adopted in the context of learning the sparsely coded patterns. 

For such a network, the macroscopic state is found to be described by the following 
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order parameters: 

m "W = whjNpS-M® (4) 

3 

Here, m M (t) is the overlap with the target pattern As the configuration of the network 
becomes close to the target pattern, this value approaches unity. The function x(t) 
represents the activity of the network. On studying the retrieval processes, we mainly 
discuss the time evolution of these parameters. 

From this point, we consider the "condensed" situation in which only one overlap is 
sizable: m p {t) ~ 0(1), and m M (t) ~ 0(l/y/N)(fi ^ p). Here, £ p is the pattern to be 
retrieved at time t. Then, the internal potential hi(t) in (0) can be separated as 

i N aN 

K{t) = tf +l ™ p {t) - 6 + E E £f +1 ^(t), (6) 

where we have written £f as £f — a. In this process, the first and the second terms in 
(|) are together regarded as the signal to induce recollection of the target pattern £f +1 
at the subsequent time step, t + 1, while the remaining term is regarded as noise. For 
convenience, we define the noise term Zi(t) as 

i N aN _ 

The quantity Zi(t) is the crosstalk noise from the non-target patterns. The essence of the 
theory is to treat the crosstalk noise Zi(t) as Gaussian noise with mean and variance 
a(t) 2 |Tj|. It has been confirmed numerically that this assumption is valid as long as the 
network succeeds in retrieval |19| . 

Now we derive the dynamical equations for the overlap m(t) and the activity x(t). 
The definition of the overlap leads to the equation 



N 

m p+l (t + 1) = X E ((^ tf +1 m p (t) - 6 + Zi (t) 



where ((• • -})^ denotes the average over the stored patterns. In the same way, we can write 
the equation for the activity x(t), 
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^ + 1) = ^E {(F tf +1 m p (t) - 9 + zt(t) )) e . (9) 



Next, we examine the time development of the variance a(t) . Expressing Zi(t + 1) as 

1 TV aN 

= -7i -w E e cr'mhM, (io) 



a(l- a)N z 
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we must consider the dependence of hj(t) on when summing over /i. In the internal 
potential hj(t), the term ^j t m M_1 (t) is estimated to be 0{\/yN). Therefore, we expand 
the function F[hj(t)}, obtaining 



N aN 



Zi{t+1) 

U(t) 
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12) 



We now assume that + 1) in ([0]) is independent of £j\ Squaring (|TT|), we obtain 

Ziit + l) 2 = aax(t + l) + U(t) 2 Zi(t) 2 

+ u(t) ££^W(Hi)m (is) 



Here the first term and the second term in ( Jl3| ) come from the square of the first term 
and the second term in fllTD , respectively. The last term in ( |13D arises from the product 
of the first term and the second term in (|TT|). For the same reason, the term S&(t) in 
(Pf) must be expanded. Following this procedure iteratively, we can take into account 
temporal correlations up to the initial time. Averaging flT3|), the equation for a(t) takes 



a{t + l) 



t+i 



aax 



(t + 1) + Uitfait) 2 + C(t + 1, t + 1 



(14) 



n=l 



with 
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T=l j 



E(ef +1 )W~ n 



a 2 (l-a) 2 N 



(15) 



Since £ M and are independent of each other, except when n = aN, 2aN, 3aN, ■ ■ ■, 
the last summation in the above equation vanishes. Although the correlations for n = 
aN, 2aN, 3aN, ■ ■ ■ remain, their effect can be regarded as negligible in the limit N — > oo. 
Finally, we obtain 

a(t + l) 2 = aax{t + 1) + U{tfa{tf. (16) 



This derivation is essentially equivalent to that by Amari |TJ 

Consequently, the behavior of the network is described by the equations 



m(t + 1) 
x{t + l) 
a(t + l) 2 



1 - - [erfc(0i) +erfc(0 o )] 



1 - 



erfc(0i 



-erfc(0 o ) 



1 r -] 2 

aax{t + 1) H a exp(— <p\) + (1 — a) exp(— 0q) 

2n L 



(17) 
(18) 
(19) 
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with 



(1 - a)m(t) - 9 
01 = V2a(t) m 

0o = am ^ +6 , (21) 

where we have set F(u) = Q(u) and replaced the site average jr Ef - • • with the average 
over the Gaussian noise (■ • ■) z ^ in the limit N — > oo. For initial values, we can set 



cx(0) = yaax(0) and choose arbitrary values for m(0) and x(0). 

In a sparsely coded network, activity control is an important factor for good retrieval 
quality. Introducing the global inhibitory interaction such as 

7 inh = J 0— (22) 

^ ^ aN , W 

the activity can be dynamically controlled [[3], [7J . The second term contributes as a global 
inhibitory interaction, and g represents its strength. If the activity level of the network at 
time t, x(t), greatly increases, each neuron receives a stronger inhibitory signal —gx(t), 
so that x(t + 1) decreases. We can undertake a treatment of the retrieval process in this 
case in a manner similar to that undertaken above. We then find that equations fl2"U|) and 
( piP are modified as 

{I - a)m{t) - gxjt) - 9 
am{t) + gx{t) + 9 



V2a(t) 



(24) 



Another model possessing an activity control mechanism is that with a time-dependent 
threshold which is calculated at each time step so that the activity of the network can be 



kept the same as that of the retrieved pattern | 15| . Recently, as an improved model, a 



"self-control" model has been proposed [16|. In this model, the time-dependent threshold 



9(t) adapts itself according to the activity a and the variance of crosstalk noise <r(t). If a 
is sufficiently small, it takes the form 9(t) = a(t)\/—2 In a. However, from the biological 
point of view, it is not plausible that the network monitors the statistical quantity of the 
crosstalk noise. Hence, in the present paper, in place of cr(t), we choose the leading term 
of cr(t), \J aax{t). Then, we simply use 



9(t) = ^-2x(t)aa\na (25) 

in place of the original expression. 

We now compare our theoretical results with numerical simulations. Figures |I]-|3| dis- 
play the results of the model using only a uniform threshold 9, a uniform threshold 9 and 
the inhibitory interaction g, and a self-control threshold 9(t), respectively. In the first two 
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cases, 9 and g are optimized so as to maximize the storage capacity. From the results, it 
is found that the theoretical curves provide a good prediction of the retrieval properties 
in the network. Although with respect to storage capacity, these three cases differ very 
little, the differences among the activity control methods are reflected in the shapes of 
basin of attraction. While the basin becomes gradually narrow as a increases in the first 
case, the basin for a > is wider than that for a = in the second case. Furthermore, 
in the last case, the minimum initial overlap for which the network succeeds in retrieval 
becomes zero when a = 0. 

Next, we investigate robustness against random synaptic dilution. In this case, a 
randomly diluted synapse is represented by the random variable Cif 

J%j Jij (^^) 

The variable Cy takes the value 1 with probability c, and is otherwise. In other words, c 
represents the ratio of connected synapses. It is known that random synaptic dilution can 



be statistically regarded as static noise in a synapse pOfl , and ultimately plays the role of 



a static noise, additional to the crosstalk noise in the retrieval dynamics [^T|]. Therefore, 
the resultant equation for the noise is modified as 

a(t + l) 2 = a{t + l) 2 + aa ] —^. (27) 



The last term in (|27|) is attributed to the variance of synaptic noise caused by dilution. 
In addition, <pi and 0o become 

(1 - a)m(t) -9 

(28) 



V2a(t) 
am(t) + 9 



(29) 



In order to examine the deterioration experienced with the decrease in the ratio of con- 
nection c at each activity a, we define the normalized storage capacity ®* c (c) = a c (c)/a c (l), 
where a c (c) is the storage capacity when the ratio of connection is c. (|J) displays the 
normalized storage capacity a*(c) as a function of c. As indicated by these results, even 
if the activity level a becomes small, the shape of the curve representing the degree of 
deterioration does not change significantly. However, as a becomes small, the storage ca- 
pacity comes to decrease almost linearly with the increase in the degree of dilution, 1 — c. 
With regard to the basin of attraction, the model with the optimized uniform threshold 
9, which has the most narrow basin, is the most robust of the three. 

Finally, we briefly mention the dependence of the storage capacity on the activity level 
a. Also, in the present case, we have numerically confirmed that the storage capacity 
diverges as — 1/alna in the limit a — > 0, and it seems to approach such an asymptotic 
form quite slowly 
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Figure 1: From the top, the equilibrium activity (dashed curve), equilibrium overlap 
(dotted curve), and basin of attraction (full curve) for a = 0.1 and 9 = 9 opt (= 0.47). The 
ordinate is the overlap m or the activity x, and the abscissa is the loading rate a. The 
data points indicate simulation results with N = 2000 for 20 trials. We take the initial 
activity as x(0) = 1.0. The inset shows the dependence of the storage capacity a c on the 
uniform threshold 9. The value at the peak of the curve corresponds to 9 opt . 
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Figure 2: A plot similar to that in figure 1 for the case 9 = and g = g op t{= 0.56). The 
other parameters are the same as in figure 1. The inset shows the dependence of the 
storage capacity a c on the inhibitory interaction g when 9 = 0. The value at the peak of 
the curve corresponds g opt . 
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Figure 3: A plot similar to that in figure 1 for the self-control model. The other parameters 
are the same as in figure 1. 
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Figure 4: Dependence of the normalized storage capacity a* c {c) on the ratio of connected 
synapses c for a = 0.5, 0.1, 0.001. 
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